Add model merge #721

klei22 · 2026-01-15T03:22:30Z

This pull request introduces a new utility script for merging two nanoGPT model checkpoints with flexible options for normalization and averaging. The main addition is the model_merge.py script, which supports L2-normalized merging, skipping normalization for specific layers, and a simple averaging mode. A demo shell script is also provided to illustrate usage.

New model merging functionality:

Added model_merge.py, a utility script for merging two nanoGPT checkpoints with options for L2 normalization, skipping final normalization for wte/lm_head weights, and simple averaging without normalization. The script handles key mismatches, shape validation, and preserves metadata.

Demo and usage examples:

Added demos/model_merge_demo.sh, a shell script demonstrating typical usage patterns for model_merge.py, including L2-normalized merge, skipping final normalization for specific layers, and simple averaging.

Copilot

Pull request overview

This pull request introduces a utility script for merging two nanoGPT model checkpoints with flexible normalization options. The script supports L2-normalized merging (the default behavior), simple averaging without normalization, and an option to skip final normalization for embedding and language model head weights.

Changes:

Added model_merge.py utility script that merges two checkpoint files with configurable L2 normalization and averaging strategies
Added demos/model_merge_demo.sh demonstrating the three main usage patterns of the merge script

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
model_merge.py	New utility for merging two nanoGPT checkpoints with L2 normalization, simple averaging, and selective layer normalization skipping
demos/model_merge_demo.sh	Demo script illustrating typical merge operations with different normalization options

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

model_merge.py

Copilot · 2026-01-15T03:25:51Z

model_merge.py

+    checkpoint_a.pop("optimizer", None)
+    checkpoint_a.pop("scheduler", None)


The code mutates checkpoint_a directly after lines 132-135 potentially create a new dictionary assignment. If checkpoint_a is reassigned to be just the state_dict (line 135), the .pop() calls will fail since a state_dict doesn't have 'optimizer' or 'scheduler' keys. Consider handling this more explicitly by checking isinstance(checkpoint_a, dict) before attempting to pop these keys, similar to the check on line 139.

Copilot · 2026-01-15T03:25:51Z

model_merge.py

+def l2_normalize(tensor: torch.Tensor, dim: int = L2_NORM_DIM) -> torch.Tensor:
+    if tensor.ndim == 0:
+        return tensor
+    if tensor.ndim == 1:
+        dim = 0


The l2_normalize function has special handling for scalar (0-dim) and vector (1-dim) tensors but lacks documentation explaining this behavior. Add a docstring documenting that: (1) scalar tensors are returned unchanged, (2) 1-dim tensors normalize along dim=0, and (3) higher-dim tensors use the provided dim parameter (default=-1).

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

klei22 added 2 commits January 2, 2026 00:58

Add checkpoint merge utility

2c3ceb7

Add model merge demo script

f8f934e

klei22 requested review from Copilot and gkielian January 15, 2026 03:22

Copilot started reviewing on behalf of klei22 January 15, 2026 03:22 View session

Copilot AI reviewed Jan 15, 2026

View reviewed changes

Update model_merge.py

eccb568

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gkielian approved these changes Jan 23, 2026

View reviewed changes

gkielian merged commit 9bf55b3 into ReaLLMASIC:master Jan 23, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add model merge #721

Add model merge #721

Uh oh!

klei22 commented Jan 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		checkpoint_a.pop("optimizer", None)
		checkpoint_a.pop("scheduler", None)

Add model merge #721

Add model merge #721

Uh oh!

Conversation

klei22 commented Jan 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants